Web Spam, Propaganda and Trust
نویسندگان
چکیده
Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search [16]. The parallels between web spamming on the internet and propaganda in the real world suggest that we can use anti-propaganda techniques to educate users and develop tools to help them evaluate the reliability of the information they find online. In this paper, we first analyze the effects that web spam has on the evolution of the search engines and their relationship to propagandistic techniques in society. Then, we examine the neighborhoods of untrustworthy sites, finding that a dense biconnected component (BCCs) containing the site provide a reasonable trust neighborhood that has parallels in social network theory. The fact that spammers employ propagandistic techniques enables us to design a heuristic that follows anti-propagandistic practices in order to recognize a spamming network. In society, recognition of an untrustworthy message (in the opinion of a particular person or other social entity) is a reason for questioning the entities that recommend the message. Entities that are found to strongly support more untrustworthy messages become untrustworthy themselves. So, social distrust is propagated backwards for a number of steps. Our heuristic simulates this behavior on the trust neighborhood of a spammer. In our experiments, we examined trust neighborhoods of web sites, both trustworthy and not. Our findings suggest that spamming networks can be reliably recognized from their relationship to a single untrustworthy starting point by backward propagation of distrust. Further, nodes involved in a spamming network can be divided into two groups: those that have content similar to the starting site (aka “link farms”), and those that have dissimilar content (aka “mutual admiration societies”). Our tool explores thousands of nodes within minutes and could be deployed at the browserlevel, making it possible to resolve the moral question of who should be making the decision of weeding out spammers in favor of the end user.
منابع مشابه
Web Spam, Social Propaganda and the Evolution of Search Engine Rankings
Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990’s, however, it was apparent that the human exp...
متن کاملPropagating Trust and Distrust to Demote Web Spam
Web spamming describes behavior that attempts to deceive search engine’s ranking algorithms. TrustRank is a recent algorithm that can combat web spam by propagating trust among web pages. However, TrustRank propagates trust among web pages based on the number of outgoing links, which is also how PageRank propagates authority scores among Web pages. This type of propagation may be suited for pro...
متن کاملOstra: Leveraging Trust to Thwart Unwanted Communication
Online communication media such as email, instant messaging, bulletin boards, voice-over-IP, and social networking sites allow any sender to reach potentially millions of users at near zero marginal cost. This property enables information to be exchanged freely: anyone with Internet access can publish content. Unfortunately, the same property opens the door to unwanted communication, marketing,...
متن کاملEnhancing Information Reliability through Backwards Propagation Of Distrust
Abstract—Search Engines have greatly influenced the way we experience the web. Since the early days of the web people have been relying on search engines to find useful information. However, their ability to provide useful and unbiased information can be manipulated by Web spammers. Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searc...
متن کاملContent Trust Model for Detecting Web Spam
As it gets easier to add information to the web via html pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information. Moreover, apart from inaccurate or untrustworthy information, we also need to anticipate web spam – where spammers publish false facts and scams to deliberately mislead users. Creating ...
متن کامل